Speech signal pitch detector using prediction error data

ABSTRACT

Pitch periods in a complex speech signal are determined by evaluating the error in predicting the value of a sample of the signal on the basis of past sample values, and by locating samples for which the prediction error is large. Advantageously, the prediction error signal is devoid of all formant structure, so that there is no chance of confusing pitch signal peaks with formant peaks. A voiced-unvoiced decision is obtained from the ratio of the mean-squared value of the speech signal to the meansquared value of the prediction error signal.

United States Patent 1 Atal [ June 19, 1973 PREDICTION PARAMETERCOMPUTER [54] SPEECH SIGNAL PITCH DETECTOR USING 2,732,424 1/1956 Oliver179/15.55 R PREDICTION ERROR DATA 3,026,375 3/1962 Graham 179/1 SA3,420,955 l/l969 Noll 179/l SA [75] Inventor: Bishnu Saroop Atal, MurrayHill,

Primary Examiner-Kathleen H. Claffy [73] Assignee: Bell TelephoneLaboratories, Assistant BradfordiL'eaheey lncorporated, Mun-3y Hill, JAttorney-R. J. Guenther and William L. Keefauver [22] Filed: July 9,1971 ABSTRACT lzll PP 161,173 Pitch periods in a complex speech signalare determined by evaluating the error in predicting the value of ['52]US. Cl 179/1 SA a Sample of the Signal on the basis of P Sample [51]Int. Cl. G101 1/04 and y locating Samples for which the Prediction [58]Field of Search 179/1 SA, 15.55 R; error is large Advantageously, theprediction error 325 33 A nal is devoid of all formant structure, sothat there is no chance of confusing pitch signal peaks with formant 5 7References Cited peaks. A voiced-unvoiced decision is obtained from the'UNlTED STATES PATENTS ratio of the mean-squared value of thespeechsignal 'to the mean-squared value of the prediction errorsignal.3,437,757 4/1969 Coker 179/1 SA 3,405,237 10/1968 David '179/1 SA 8Claims, 2 Drawing Figures THRESHOLD 25 DETECTOR I VOICED UNVOICED J,2423 SIGNAL MEAN- MEAN- SQUARE DIVIDER SQUARE I NETWORK NETWORK '10 n 13LR PlTCH PULSES OUT FILTER SAMPLER m l8 SPEEC l6 Iv-l. ,19 20' 2| H l2SUBTRACTOR L.F? PEAK THRE SlGlldALS NETWORK FILTER I RECT' PlCKER l"DETEET O R I CLOCK [M l -J ADAPTIVE PREDICTOR SPEECH SIGNAL PITCHDETECTOR USING PREDICTION ERROR DATA This invention is concerned withthe analysis of complex signals, and particularly with the determinationof the fundamental frequency, or period, of a complex periodic signal,such as a voiced speech signal. Its principal objectives are to simplifythe measurement of pitch frequency and to improve the reliability of themeasure.

BACKGROUND OF THE INVENTION A number of arrangements for reducing thechannel capacity required for the transmission of complex signals, suchas speech signals, have been proposed. One of the best known of these isthe vocoder. More recently, techniques for removing inherent signalredundancy through the use of linear prediction techniques have beendescribed. In all of these arrangements, a speech wave is analyzed todetermine its significant characteristics, and coded informationconcerning these characteristics is transmitted instead of the speechsignal itself. At a receiver station a synthetic speech signal isdeveloped from the coded information.

In general, a different set of coded signal information is employed ineach type of bandwidth compression system. However, virtually all employone characteristic of the speech signal, namely, its pitch frequency.This characteristic denotes the fundamental frequency at which the vocalcords vibrate during the production of different voiced speech sounds.Most speech bandwidth compression systems also employ coded informationto identify a speech signal as voiced or unvoiced. Some combine the twoforms of information so that the pitch signal inherently specifies thevoicing condition.

FIELD OF THE INVENTION.

A number of different proposals for automatically measuring andencodingthe pitch characteristic of a speech signal are known and used in theart. Some rely on simple filtering, some on signal correlation, some onformant detection and tracking, and others on a transformation of thelogarithm of the spectrum of a speech signal, the so-called cepstrum ofthe signal. All of these arrangements, however, operate on the speechsignal itself and in one way or another strive to find peak values inthe signal, or in a modification of it, which identify the pitchcharacteristic. Unfortunately, peaks due to formants, particularly thefirst formant of a speech signal, are often stronger than a peakdeveloped to indicate pitch. If the two peaks are close together, it isdifficult to determine which is which. Consequently, even the mostsophisticated pitch detectors are subject to error and do not alwayscorrectly characterize the pitch frequency of a signal.

It is thus another object of this invention to capitalize on a uniqueproperty of a voiced speech signal to develop a measure of the pitchfrequency of the signal that is unambiguous and which is entirelyindependent of the formant character of the speech signal.

SUMMARY OF THE INVENTION Analysis of a complex speech signal todetermine its pitch frequency is, in accordance with the invention,based on an analysis of the error between a predicted value of thespeech signal based on its past sample values and its actual value atthat moment. The time interval represented by the number of samples usedto ob tain the predicted value is typically 1 msec. Due to the shortmemory used in the prediction process, the predicted signal valuesrepresent, in large measure, the formant structure of the speech signal.The pitch analysis arrangement of the invention is particularlyeffective because, in developing a difference signal, i.e., theprediction error signal, the formant structure of the signal is removedfrom the input signal. Yet, since-the pitch period in speech signalsranges typically from 3 msec to 20 msec, the prediction of the pitchstructure, based on 1 msec of past speech, is completely negligible.Thus, pitch information is retained in the prediction error signal.Consequently, there is little or no interference from the formantstructure and a peak picking operation is effective in developing ameasure of the pitch character of the input signal.

A feature of the invention is the additional use of prediction errorsamples to develop a voiced-unvoiced signal indication. In accordancewith the invention, a voicing decision is based on the ratio of themeansquared value of input signal samples to the meansquared value ofcorresponding prediction error samples.

This invention will be more fully understood from the following detaileddescription of an illustrative embodiment of it taken together with theattached drawings.

BRIEF DESCRIPTIONOF THE DRAWINGS FIG. 1 is a block schematic diagram ofa speech signal analysis system which illustrates the principles of theinvention, and

FIG. 2 is an illustration of the waveform of a segment of a voicedspeech signal, the positions of detected pitch pulses in the voicedspeech signal, as shown by vertical lines, and a segment of unvoicedspeech.

DETAILED DESCRIPTION A signal analysis arrangement which illustrates theprinciples of the invention is illustrated in FIG. 1. Speech signalssupplied from any desired source are delivered to the analyzer andpassed through low-pass filter 10. Filter 10 typically has a cutofffrequency in the neighborhood of 5 kHz. The resultant signal is thensampled at a frequency of approximately 10 kHz in sampler 11 undercontrol of signals from clock 12.

Speech samples, s,,, thus derived are supplied to storage unit 13 whichmaintains them in order, typically in blocks of 200 samples, i.e., s sS200. Blocks or frames of samples are periodically keyed out of storageunit 13, for example, under control of a signal from clock 12, anddelivered to adaptive predictor l4, prediction parameter computer 15,and to subtractor network 16.

Adaptive predictor 14 operates on supplied signal samples' to predictthe present value of each sample on the basis of a weighted summation ofa number of prior sample values. The prediction operation is carried outon a sample-by-sample basis and predictor 14 is periodically suppliedwith a new frame of samples from storage unit 13. An adaptive predictorsuitable for use in the system of this invention is described in detailin a copending application of B. S. Atal, Ser. No. 753,408, filed Aug.19, 1968, now U.S. Pat. No. 3,631,520.

To accommodate the constantly changing character of the input speechsignal, predictor I4 is controlled to adapt it to the current signalcondition. It has been found sufficient to readjust the values of theparame' ters used to control the predictor at intervals comparable tothose of a pitch period of the signal. Since the exact pitch interval isnot available (although the pitch output signal of the system may beused in a feedback arrangement to approximate the interval of a laterpitch period), readjustment of the parameter values at intervalscorresponding approximately to the time of 200 samples is entirelysatisfactory. This corresponds to a time interval of approximately 20msec.

Prediction parameter computer thus operates on applied speech samplesfrom unit 13 to develop a sequence of parameter signals a a a a,,, whichare used periodically to adjust predictor 14. Parameter values a areselected to minimize the mean-squared prediction error of the system. Anextensive discussion of the relation of parameter signals a to the inputsignal, their development, and the manner in which they are used tocontrol the predictor is explained in detail in the above-mentionedcopending patent application. Parameter signals from computer 15 aredeveloped well in advance of the time that a block of signals isprocessed in predictor 14 because of the delay inherent in theprediction operation. Typically, parameter control signals are developedwithin an interval corresponding to the'time of approximately 60samples.

Sample values developed by predictor 14 are subtracted in network 16fromthe actual value of corresponding signal samples delivered fromstorage unit 13 to the subtractor. The resultant difference signalrepresents the error in predicting the value of the signal. It isaccordingly called a prediction error signal. Evidently, appropriatedelay is provided, for example, in the readout of samples from storageunit 13 or in their delivery to subtractor 16, to allow time for allpredictor operations to be completed. Suffice it to say that all of thedescribed operations are carried on in synchronism in a conventionalmanner.

It is of importance to recognize that the values of signal samples arepredicted largely on the basis of their formant constituency. Predictedsignals, therefore, represent essentially the formant structure of theinput signal. Since the predicted signal values are subtracted fromactual signal values, the prediction error signal at the output ofsubtractor network 16 is essentially devoid of all formant information.Yet, the prediction error signal has been found to preserve, and indeedto denote, the pitch character of the applied signal.

Prediction error signals from subtractor 16 are passed through low-passfilter 17. Filter 17 is constructed with a relatively low cutofffrequency since the fundamental pitch of the applied signal generally isin the lower portion of the band. Elimination of higher frequencyportions aids in isolating the pitch signal.

In accordance with the invention, the positions of individual pitchpulses in the applied signal is determined by locating the samples forwhich the prediction error is large. Samples delivered from filter 17thus have amplitudes that are proportional to the difference between theapplied signal sample and the predicted signal. It is necessary,therefore, only to seek the fundamental frequency of the prediction(error) signal. This may be done using any desired fundamental frequencydetector 18 of any desired construction. A suitable detector includes ahalf-wave rectifier 19, employed to retain positive peaks only of thesignal in order to simplify later operations. The rectified signal isdelivered to peak picking network 20, which seeks the largest sample ineach frame of signals. Such peak picking arrangements are well known tothose skilled in the art and are frequently used in pitch detectionarrangements, particularly those of the cepstrum type. Peak signals thusdeveloped are passed through threshold detector 21', adjusted to a levelselected to prevent minor peaks from reaching the output of theanalyzer. The threshold is adjusted to accommodate the true fundamentalfrequency peaks determined, for example, from experience. The resultingsequence of pitch pulses is indicative of the fundamental frequency orperiod of the applied speech signal and may be used in any desiredfashion.

Alternatively, as previously described in the art, the fundamentalfrequency detector may include an autocorrelator followed by a peakpicker and a threshold detector.

FIG. 2 illustrates a typical interval of a speech signal. A voicedspeech segment is shown in line A. Line B illustrates the sequence ofpulses derived from fundamental frequency detector 18 as the outputsignal of the analyzer system. Line C of the figure illustrates atypical unvoiced segment of speech.

To assure that a clear distinction between voiced and unvoiced signalsegments is available, it is in accordance with the invention to producea voiced-unvoiced decision signal. In accordance with the invention, thevoiced-unvoiced decision is based on the ratio of the mean-squared valueof speech samples to the meansquared value of prediction error samples.It has been found that this ratio is considerably smaller for unvoicedspeech sounds than for voiced speech sounds, typically by a factor ofapproximately 10.

Accordingly, speech samples from sampler 1 1 are delivered tomean-squared network 22 and prediction error samples from subtractor 16are delivered to mean-squared network 23. Networks for deriving a signalproportional to the mean value of sequence of samples are well known inthe art and are frequently used in acoustic signal processing apparatus.A typical network includes an arrangement for developing a signalproportional to the square of each signal sample, an adding network forsumming a sequence of squared signal values, and a divider network fordeveloping a signal proportional to the average,or mean value, of thesummed squared signals.

Two signals proportional, respectively, to the meansquared value ofspeech samples and the mean-squared value of prediction error samplesare delivered to divider network 24 which produces as its output thequotient of the two signal values. The quotient signal is thereupondelivered to threshold detector 25, which is arranged to develop a firstsignal for quotient values greater than 10, as an indication of a voicedsignal interval, and a second signal for quotients less than 10, as anindication of an unvoiced'signal interval. Output signals from detector25 maybe used in any desired fashion to indicate the voicing characterof the input signal.

It will be evident to those skilled in the art that the fundamentalfrequency determination arrangement of the invention, together with thevoicing decision arrangement, greatly enhances the reliability withwhich two important characteristics of a speech signal are determined.This increased reliability is due primarily to the virtual absence offormant structure in the signal at the time the pitch measurement ismade. Furthermore, it will be apparent that the fundamental frequencydetector of the invention is particularly applicable to use in a speechtransmission system or a speech analysis system in which a linearprediction arrangement is used. In such cases, it is evident that theprediction error signal delivered to subtractor 16 may be derived fromthe predictor used in coding the speech signals.

Furthermore, it will be apparent that the voicing decision signal may beused in conjunction with other criteria, such as the spectral balance oflow frequencies related to high frequencies to make the voicedunvoiceddecision more reliable.

What is claimed is:

l. A signal analyzer for determining the fundamental period of a speechsignal, whichcomprises,

adaptive predictor means supplied with samples of said speech signal forpredicting the present value of each sample on the basis of a weightedsummation of a number of prior sample values of said speech signal,

means for subtracting said predicted speech value from the actual speechvalue to develop a difference signal, and

means for determining the fundamental frequency of said differencesignal as an indication of the fundamental period of said speech signal.

2. A signal analyzer as defined in claim 1, wherein said means fordetermining the fundamental frequency of said difference signalcomprises,

means for determining the frequency of occurrence of difference signalmaxima above a prescribed threshold.

3. A signal analyzer as defined in claim 1, wherein said means fordetermining the fundamental frequency of said difference signalcomprises, t

means for autocorrelating said difference signal for developing anautocorrelation signal representative of the periodic character of saiddifference signal, and

means for detecting the location of the peak value of saidautocorrelation signal.

4. Apparatus for determining the fundamental period of a speech signal,which comprises,

means for developing an estimate of the present value of a speech signalon the basis of past values of said speech signal,

means for developing a signal representative of the difference betweensaid signal estimate and the true present value of said speech signal,and

means for determining the fundamental frequency of said differencesignal to develop a signal representative of the fundamental period ofsaid speech signal.

5. Apparatus for determining the fundamental period of a speech signal,which comprises,

adaptive predictor means supplied with samples of said speech signal fordeveloping an estimate of the momentary value of said speech signal frompreviously supplied samples, means for developing a prediction errorsignal from the difference between said predicted signal estimate andthe corresponding momentary value of samples of said speech signal,means for identifying prediction error samples whose magnitudes areabove a prescribed threshold, and means for utilizing the frequency ofoccurrence of said identified error samples as a measure of thefundamental period of said speech signal. 6. Apparatus for analyzing thecharacter of a speech signal, which comprises, in combination,

predictor means supplied with samples of a speech signal for developingan estimate of the momentary value of said signal from previouslysupplied samples, means for developing prediction error signal samplesfrom the difference between samples of said signal estimate and thecorresponding momentary value of samples of said speech signal, meansfor identifying prediction error samples whose magnitudes are above aprescribed threshold, means for developing a first signal proportionalto the mean-squared value of said speech samples, means for developing asecond signal proportional to the mean-squared value of correspondingones of said error samples, means for developing a signal proportionalto the ratio of said first to said second mean-squared signals, meansfor utilizing the frequency of occurrence of said identified thresholderror samples as a measure of the fundamental period of said speechsignal, and means for utilizing said ratio of first and secondmean-squared signals as a measure of the voicing characteristic of saidspeech signal. 7. Apparatus for analyzing the character of a speechsignal as defined in claim 6, wherein,

values of said ratio of mean-squared signals equal to or greater than aprescribed threshold are used to classify said speech signal as voiced,and wherein values of said ratio of mean-squared signals less than saidthreshold are used to classify said speech signal as unvoiced. 8. In apitch analysis arrangement for speech signals, the combination of,

means for developing a signal representative of the formant structure ofan applied speech signal, means for removing said formant representativesignal from said speech signal to produce a signal essentially devoid ofall formant information, means for measuring the period of said formantdevoid signal, and means for determining the voicing character of saidspeech signal on the basis of the power in said speech signal and thepower in said formant devoid signal.

1. A signal analyzer for determining the fundamental period of a speechsignal, which comprises, adaptive predictor means supplied with samplesof said speech signal for predicting the present value of each sample onthe basis of a weighted summation of a number of prior sample values ofsaid speech signal, means for subtracting said predicted speech valuefrom the actual speech value to develop a difference signal, and meansfor determining the fundamental frequency of said difference signal asan indication of the fundamental period of said speech signal.
 2. Asignal analyzer as defined in claim 1, wherein said means fordetermining the fundamental frequency of said difference signalcomprises, means for determining the frequency of occurrence ofdifference signal maxima above a prescribed threshold.
 3. A signalanalyzer as defined in claim 1, wherein said means for determining thefundamental frequency of said difference signal comprises, means forautocorrelating said difference signal for developing an autocorrelationsignal representative of the periodic character of said differencesignal, and means for detecting the location of the peak value of saidautocorrelation signal.
 4. Apparatus for determining the fundamentalperiod of a speech signal, which comprises, means for developing anestimate of the present value of a speech signal on the basis of pastvalues of said speech signal, means for developing a signalrepresentative of the difference between said signal estimate and thetrue present value of said speech signal, and means for determining thefundamental frequency of said difference signal to develop a signalrepresentative of the fundamental period of said speech signal. 5.Apparatus for determining the fundamental period of a speech signal,which comprises, adaptive predictor means supplied with samples of saidspeech signal for developing an estimate of the momentary value of saidspeech signal from previously supplied samples, means for developing aprediction error signal from the difference between said predictedsignal estimate and the corresponding momentary value of samples of saidspeech signal, means for identifying prediction error samples whosemagnitudes are above a prescribed threshold, and means for utilizing thefrequency of occurrence of said identified error samples as a measure ofthe fundamental period of said speech signal.
 6. Apparatus for analyzingthe character of a speech signal, which comprises, in combination,predictor means supplied with samples of a speech signal for developingan estimate of the momentary value of said signal from previouslysupplied samples, means for developing prediction error signal samplesfrom the difference between samples of said signal estimate and thecorresponding momentary value of samples of said speech signal, meansfor identifying prediction error samples whose magnitudes are above aprescribed threshold, means for developing a first signal proportionalto the mean-squared value of said speech samples, means for developing asecond signal proportional to the mean-squared value of correspondingones of said error samples, means for developing a signal proportionalto the ratio of said first to said second mean-squared signals, meansfor utilizing the frequency of occurrence of said identified thresholderror samples as a measure of the fundamental period of said speechsignal, and means for utilizing said ratio of first and secondmean-squared signals as a measure of the voicing characteristic of saidspeech signal.
 7. Apparatus for analyzing the character of a speechsignal as defined in claim 6, wherein, values of said ratio ofmean-squared signals equal to or greater than a prescribed threshold areused to classify said speech signal as voiced, and wherein values ofsaid ratio of mean-squared signals less than said threshold are used toclassify said speech signal as unvoiced.
 8. In a pitch analysisarrangement for speech signals, the combination of, means for developinga signal representative of the formant structure of an applied speechsignal, means for removing said formant representative signal from saidspeech signal to produce a signal essentially devoid of all formantinformation, means for measuring tHe period of said formant devoidsignal, and means for determining the voicing character of said speechsignal on the basis of the power in said speech signal and the power insaid formant devoid signal.